Operation And Maintenance Manual What Are The Monitoring Alarms And Capacity Planning Recommendations For Singapore Cloud Storage Servers?

2026-05-11 15:30:19

Current Location： Blog > Singapore VPS

overview and scope of application

applicable objects: block storage, object storage and file service nodes deployed in the singapore region (such as aws ap-southeast-1, alibaba cloud singapore, etc.).
goal: ensure availability, predictable capacity, operationalization and automation of alarms. this article uses prometheus/grafana/alertmanager as an example monitoring stack, and includes actual expansion and temporary processing steps.

monitoring item collection and deployment steps (instance level)

steps: 1) install node_exporter on each storage server: sudo apt update && sudo apt install -y prometheus-node-exporter.
2) configure prometheus scrape: add - job_name: 'nodes' static_configs: - targets: ['ip:9100'] to prometheus.yml and restart prometheus. sudo systemctl restart prometheus.
3) collection items: disk usage (/, /data), inode usage, disk latency (iostat or node_exporter disk_latency), network bandwidth, cpu, memory, disk queue length, number of file handles.

object storage and gateway monitoring

steps: 1) for s3-compatible storage, turn on the access log on the storage side, push it to a dedicated bucket and parse it with fluentd/fluent bit and report it to prometheus or send it directly to elasticsearch.
2) key indicators: put/get 4xx/5xx rate, 95/99p response delay, sharding/replication delay, object number growth rate, life cycle hot/cold times.

alarm rules and threshold recommendations (example)

example prometheus rules: 1) disk_usage_percent > 80 for 5m → warning; >90 for 2m → critical.
2) inode_usage > 90% for 5m. 3) disk_io_avg_latency_ms > 50ms for 5m. 4) s3_5xx_rate > 0.5% for 10m.
rule writing reference: alert: diskalmostfull expr: (node_filesystem_avail_bytes{mountpoint="/data"} / node_filesystem_size_bytes{mountpoint="/data"}) * 100 < 20

alarm routing and receiver configuration

steps: 1) configure routes in alertmanager: route to slack/email/pagerduty/sms by severity, team, and service classification.
2) configure templates and suppression rules (snooze): short-term i/o peaks can be suppressed for 15 minutes.
3) test process: use amtool or curl to trigger a simulated alarm and confirm receipt and carbon copy.

alarm handling (runbook) and quick handling commands

general process: receive an alarm → log in to the affected host → check top/df -h/iostat/vmstat → determine whether it is a sudden increase or a long-term increase.
quickly free up space: 1) clean /var/log: sudo journalctl --vacuum-time=3d; 2) clean temporary directories: sudo rm -rf /tmp/*; 3) delete old backups or migrate to cold storage (example: aws s3 mv /backup s3://cold-bucket --storage-class glacier).
temporary solution for capacity expansion: mount a new disk, rsync the data to the new disk, and update fstab.

capacity planning steps (detailed how-to guide)

1) data collection: export daily used_bytes, object_count, daily_ingest_bytes for the past 90-180 days; you can use prometheus or cloud monitoring api (aws cloudwatch) to export csv.
2) calculate the daily growth rate: use linear regression or find the average daily increment of the last 30 days = (last - first)/days.
3) forecast and safety factor: take 95% of the forecast based on business peaks, and add strategic redundancy of 20%-30% (up to 50% for key businesses).
4) develop a retention and tiering policy: hot storage for 30 days, cold storage for 90-365 days and enable automatic transfer of life cycle rules. documented and registered in cmdb.

capacity expansion operation (block storage/cloud disk and file system)

cloud disk expansion (taking aws as an example): 1) aws ec2 modify-volume --volume-id vol-xxx --size 200 --region ap-southeast-1.
2) check on the instance: sudo lsblk, if you need to expand the partition: sudo growpart /dev/xvdf 1; then expand the file system: for xfs sudo xfs_growfs /mountpoint; for ext4 sudo resize2fs /dev/xvdf1.
add a new disk and migrate: mount the new disk → rsync -av /data/ /mnt/newdata/ → modify fstab → restart the service and gradually switch.

q&a 1

question: how to prevent abnormal 5xx alarms of object storage from being falsely reported in the singapore region?

answer: the key is to set short-term suppression and percentage thresholds: use the 5xx request rate (5xx_count / total_requests) as an indicator, and configure a threshold such as >0.5% for 10 minutes as an alarm. at the same time, false alarms caused by short-term deployment are suppressed (silent when deploy_tag=true), and the request delay and back-end error rate are combined to determine whether it is a real fault.

10.

q&a 2

question: what historical window is more accurate for capacity forecasting?

answer: a window of 90 to 180 days is usually used to take into account seasonality and recent trends. for rapidly growing businesses, the 30-day growth rate and the 90-day growth rate can be calculated in parallel, taking conservative values and retaining 20%-30% redundancy. temporary adjustments are required when there are promotions or migration windows.

11.

question 3

question: what should be the first step when the disk suddenly receives a high io alarm?

answer: the first step is to check the traffic and process: log in to the host and execute iostat -x 1 5, iotop, ps aux --sort=-%cpu to determine whether it is caused by backup/scan/batch processing; if it is an expected task, prioritize speed limiting or migration tasks; if it is an abnormal write, find the large file generator and temporarily stop the service. if necessary, remove the hot data to the cold disk.

Previous article： Huawei Cloud Server Hong Kong And Singapore Multi-region Deployment And Network Optimization Practical Guide

Next article： Real-time Updated Source Of Singapore Vps Vouchers During Holidays And Promotional Seasons

Latest articles: Can I Open A Roaming Server In Malaysia? Technical Implementation Path And Network Configuration Suggestions; Network Design And Fault Recovery Strategy Using Malaysian Cn2 To Build A High-availability Architecture; How Can Newbies Complete Taiwan Vps Server Rental And Resource Planning Within A Budget?; How Overseas Users Use Japanese Native Ip L2tp To Access Local Services And Optimization Suggestions; Stability Analysis Of Singtel's Computer Room Cn2 In Voip And Live Video Scenarios; Best Practices For Using American Computer Room Servers In Enterprise-level Application Scenarios; From The Perspective Of Security Operation And Maintenance, The Emergency Response And Recovery Process Of Japanese Server Cracking Software; Technical Capabilities And Deployment Efficiency Analysis Of Common Technical Advantages Of High-quality Vietnamese Server Shops; How To Judge Whether The Japanese Cn2 Gia Line Is Suitable For Your Website Access Needs; Alibaba Cloud Malaysia Lightweight Server Entry-level Deployment And Performance Optimization One-step Tutorial

Popular tags

Singapore Vps Recommends How To Choose The Most Suitable Service Provider

this article will introduce in detail how to choose a singapore vps service provider that is suitable for you, including evaluation of cost performance, performance and service quality.

More
Principles For Adapting Singapore Cloud Server Selection Rules To Different Loads And Business Scales

from small sites to cross-regional e-commerce, this article systematically explains how to choose the right cloud server for different loads and business scales in the singapore environment: practical principles and decision-making processes for performance, network, storage, cost, compliance and elastic expansion.

More
Reasons And Usage Experience For Choosing Singapore Cn2 Cloud Server

discuss why you choose singapore cn2 cloud server and its usage experience, including analysis of performance, stability, support, etc.

More